Model Selection

Reinforcement Learning Training

# Reinforcement Learning Training

Thinkless 1.5B RL DeepScaleR

Thinkless is a large language model trained via reinforcement learning, capable of adaptively selecting short or long-chain reasoning modes, significantly reducing inference computational costs.

Large Language Model

A 7B-parameter specialized inference language model series launched by Xiaomi, significantly enhancing mathematical and code reasoning capabilities through optimized pre-training and post-training strategies

Large Language Model

MiMo-7B-RL is a reinforcement learning model trained based on the MiMo-7B-SFT model, achieving performance comparable to OpenAI o1-mini in mathematical and code reasoning tasks.

Large Language Model

VL-Reasoner-7B is a multimodal reasoning model trained using GRPO-SSR technology, demonstrating outstanding performance across multiple multimodal reasoning benchmarks.

Transformers English

Timezero ActivityNet 7B

TimeZero is a reasoning-guided large-scale vision-language model (LVLM) specifically designed for temporal video grounding (TVG) tasks, achieving dynamic video-language relationship analysis through reinforcement learning methods.

Timezero Charades 7B

TimeZero is a reasoning-guided large vision-language model (LVLM) specifically designed for temporal video grounding (TVG) tasks. It identifies temporal segments in videos corresponding to natural language queries through reinforcement learning methods.

The OpenChat v2 series is a language model based on the LLaMA-13B framework, trained with conditional weighted loss, surpassing ChatGPT performance in multiple benchmarks.

Large Language Model

Transformers English

Promptist is a reinforcement learning-based automatic prompt optimization tool designed for Stable Diffusion, transforming user input into model-preferred prompts.

Text Generation

Dqn SpaceInvadersNoFrameskip V4

This is a reinforcement learning agent based on the DQN algorithm, specifically designed to play SpaceInvadersNoFrameskip-v4, trained using the stable-baselines3 library.

Video Processing

Dqn Mountaincar V0 Zoo

This is a reinforcement learning agent based on Deep Q-Network (DQN), specifically designed to solve tasks in the MountainCar-v0 environment.

Dqn Mountaincar V0

This is a reinforcement learning agent based on Deep Q-Network (DQN), specifically trained to solve control problems in the MountainCar-v0 environment.

Dqn SpaceInvadersNoFrameskip V4

This is a DQN agent trained using the Stable Baselines3 library, specifically designed to play the SpaceInvadersNoFrameskip-v4 game.

Video Processing

Ppo BipedalWalker V3

This is a PPO agent model trained using the stable-baselines3 library, specifically designed for reinforcement learning tasks in the BipedalWalker-v3 environment.

PPO LunarLander V2

This is a reinforcement learning model based on the PPO algorithm, specifically trained for the LunarLander-v2 environment to safely control the lunar lander.

Dqn LunarLander V2

This is a DQN agent trained using the stable-baselines3 library to solve reinforcement learning tasks in the LunarLander-v2 environment.

Ppo Pendulum V1

This is a reinforcement learning model based on the PPO algorithm, specifically designed to solve control problems in the Pendulum-v1 environment.

Ppo PongNoFrameskip V4

This is a PPO agent trained using the stable-baselines3 library, specifically designed to play the Atari game PongNoFrameskip-v4.

Video Processing

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase